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Scaling law for geometrical and dynamical quantities of biological molecules is an interesting 
topic. According to Flory's theory, a power law between radius of gyration and the length of 
homopolymer chain is found, with exponent 3/5 for good solvent and 1/3 for poor solvent. For 
protein in physiological condition, a solvent condition in between, a power law with exponent ^ 2/5 
is obtained. In this paper, we present a unified formula to cover all above cases. It shows that the 
scaling exponents are generally correlated with fractal dimension of a chain under certain solvent 
condition. While applying our formula to protein, the fractal dimension is found to depend on its 
hydrophobicity. By turning a physical process-varying hydrophobicity of a chain by amino acid 
mutation, to an equivalent chemical process- varying polarity of solvent by adding polar or nonpolar 
molecules, we successfully deprive this relation, with reasonable agreement to statistical data. And 
it will be helpful for protein structure prediction. Our results indicate that the protein may share 
the same basic principle with homopolymer, despite its specificity as a heteropolymer. 

PACS numbers: Valid PACS appear here 
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I. INTRODUCTION 

It is well known that a protein can refold to its 
native structure from denatured state under physiol- 
ogy condition. However, the mechanism underlying 
is still unknown and becomes one of basic intellectual 
challenges in molecular biology [i|. In the study of 
protein folding, radius of gyration, defined as Rg = 

\/wJ:tLi{Ri-<R>r, <R>=j^ Eti R^, is intro- 
duced as an important quantity. It is not only able to 
describe the static compactness of a protein structure, 
but also the folding process from denatured state to na- 
tive state. Experimentally, Takahashi et. at. used small- 
angle X-ray scattering method to measure time evolution 
of Rg during a protein's folding process. In their study, 
significant changes in radius of gyration from unfolded to 
folded conformations were observed in several proteins by 
pH jump[2]. 

An interesting question is about the relationship be- 
tween Rg and other physical quantities. In this paper, we 
present a scaling law between radius of gyration and the 
length of protein chain {N) by exploiting Protein Data 
Bank: Ra oc A^^, which has also been reported by other 
authors [3, 0, [lO, [ill, [l2[. Through generalizing former 
Flory's theory [3|, we get a new unified formula, which 
can be applied to polymer in poor solvent, polymer in 
good solvent and protein under physiological condition 
etc. It shows that the scaling exponents are generally 
correlated with the fractal dimension of a chain. We also 
study the influence of hydrophobicity on compactness 
of a protein chain. By considering the equivalence be- 
tween protein- solvent coupled systems, we turn a physical 
process- varying hydrophobicity of a chain by amino acid 



mutation, to a chemical process-varying polarity of sol- 
vent by adding polar or nonpolar molecules. This enables 
us to derive a relation between hydrophobicity and frac- 
tal dimension, with good agreement to statistical data. 

The paper is organized as follows: In Section II, a scal- 
ing law of radius of gyration for proteins under physio- 
logical condition is presented. In section III, we deprive 
our new unified formula based on Flory's original theory. 
In Section IV, the influence of hydrophobicity on fractal 
dimension is studied. Section V will be a brief conclu- 
sion. In Appendix, the relation between scaling exponent 
and hydrophobicity is studied directly, in the same way 
as Section IV. 



II. SCALING EXPONENT FOR PROTEIN 
UNDER PHYSIOLOGICAL CONDITION 

If neglect minor differences between amino acids, pro- 
tein can be treated as a homopolymer. According to well- 
known Flory's theory [3|, y, |5|, |6|, there exists a universal 
scaling law between radius of gyration and the length of 
polymer chain. 



Rg (X AT^, 



(1) 
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where exponent v depends on solvent condition. Under 
good solvent condition, monomers are separated by sol- 
vent molecules. Thus we have v = 3/5. Under poor sol- 
vent condition, the chain is highly compressed by solvent 
pressure. And i^ = 1/3 is as high as crystals. 

However, proteins under physiological condition have 
their specificity. On one hand, they are compact due 
to hydrophobic interactions. On the other hand, they 
are usually not well-packed and contain many cavities 
inside [7]. Geometrically, these cavities are a consequence 
of regular secondary structures in folded proteins. Fur- 
thermore, they are essential for biological functions, since 



they can serve as binding sites when contacting to other 
molecules. Therefore, folded proteins should be more 
compact than polymers in good solvent, and looser than 
highly compressed polymers in poor solvent, i.e., 1/3 < 
V < 3/5. This argument is confirmed by statistical study 
of over 37,000 protein structures from Protein Data Bank 
(PDB), which yields u ~ 2/5 (FigjT|) and agrees with the 
research of Arteca[8, '9|, llO|. It indicates that proteins in 
native state are not so compact as crystals, which is a bit 
different from current popular view^llj. 
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FIG. 2: Log- log plot of proteins with different secondary 
structure, (a) 3080 all-a proteins with N^/N > 0.5 {N^ is the 
number of amino acids in a-helix. And single a-helix is ex- 
cluded.), ly = 0.4026di0.0036 by least-quare linear fit. (b) 334 
all-;^ proteins with Np/N > 0.5 [Np is the number of amino 
acids in /3-sheet), v = 0.3838 ±0.00746. (c) 25804 a/f3 mixed 
proteins with (iV^ + Np)/N > 0.5. u = 0.4166 ± 0.0010. 
(d) 839 unstructured proteins with {Nc, + Np)/N < 0.2. 
zy = 0.4038 ± 0.0097. 



FIG. 1: A log-log plot of 37162 protein structures in PDB, 
with jy = 0.3916 =b 0.0008 by least-quare linear fit. 

The influence of secondary structure is also studied. 
In Fig. 2, we show statistical data for all-a, all-/?, a//3 
mixed protein structures (the fractions of amino acids in 
secondary structures are larger than 50%) and unstruc- 
tured proteins (the fraction of amino acids in secondary 
structures is less than 20%) in PDB. Despite their great 
differences in secondary structure, their scaling exponent 
u are all approximate to 2/5. This result seems a bit con- 
tradictive to our common sense at first glance. Since it 
is easily to see that for single straight a-helix, z/ = 1; for 
perfect planar /3-sheet, u = 1/2. They are both largely 
apart from u = 2/5. However, as secondary structure 
is a local characteristic, while scaling exponent u mainly 
depends on over-all topological properties. When the 
protein is large enough to contain sufficient secondary 
structures, their influence will be quite limited. These 
results imply that there may exist a unified mechanism 
for the scaling law between radius of gyration and the 
length of protein chain. 



III. GENERALIZED FLORY'S THEORY 

To obtain a unified formula for scaling law valid under 
different solvent conditions, we try to generalize Flory's 
original theory [3, y, |5|, |6|. We assume that the chain 
is made up of N monomers, which are indistinguishable 



from each other. Then its overall size is mainly deter- 
mined by two following effects: excluded volume effect 
that tends to swell the chain, and elastic interaction that 
tends to shrink the chain. 

Firstly, the excluded volume effect is a consequence 
of repulsive interactions between monomers, with energy 
(two-body repulsive interaction) given by[3, H^S Q 






(2) 



where v is single monomer's volume. 

Then, we calculate the elastic energy. Generally speak- 
ing, this term is originated from contact interactions 
between monomers, which include hydrophobic interac- 
tion between monomers and solvent molecules, covalent 
bonds, hydrogen bonds and Van der Waal's interaction 
between neighboring monomers, etc. Since we are unable 
to give an explicit formula, we adopt harmonic approxi- 
mation to find the dominant part. 

Let dij be the real distance between monomers i and 
j. Then monomer i is considered to be in contact with 
monomer j, if dij < (5, where J > is some given con- 
stant. Let do be average distance between any two con- 
tact monomers i and j. do is independent to index i and 
j, and corresponds to the minimum of potential energy. 
Under harmonic approximation, the elastic energy of a 



chain with N monomers is given by 

N 



^ela = 9 XI o'^^^^^ ~ dofxidij), 
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1 , \ii ^ 2 ^^d dij < S 
0, else 



where the first factor 1/2 dues to double counting of 
monomers. And Hi is Hooke coefficient. Define root- 
mean-square contact distance {d) as 



d' = ^j:4xid,^), 



i=i 



where n = ^j=iX{dij) is local contact number, n and 
d are supposed to be independent to index z, for all 
monomers are equal in our treatment. On the other hand, 
we have 

AT 

^dijx{dij) = ndo. 

So now we can rewrite E^ia as 

1 ^ _ 1_ 1_ 

^eia = - 2^ ^^{^'^ ~ <^o) = -nNnd^ — -nNnd^^ 

As the second term is independent of Rg^ it will be omit- 
ted in later discussions. Thus, we get 



Eeia = -i^nNd^ 



(3) 



In general, the root-mean-square contact distance d is a 
function of Rg and N {d = d{Rg^N)), and depends on 
compactness of a chain. 

As suggested by many authors, the protein can be re- 
garded as a fractal in some extent [l2|, E, H, [lH, M, [I3 • 
If there exists a self-similarity in number density between 
small-scale and large-scale structure (Fig. 3), we can 
write 



(n + l)/d" = 7V/i?^ 



(4) 



where a stands for fractal dimension of a protein's struc- 
ture. Thus the root-mean-square contact distance is ob- 
tained as 



d=(n + l)^/"i?^/A/'^/" 
Put into Eqn.(4), 



E, 



la ^ 



4 ^ ^ jsi^/a-^ 



(5) 



(6) 



Hence, the total energy is given by 
E = E,ep+^eia = kBTv— + -Knin+lf/'^-^ (7) 
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FIG. 3: A log-log plot for 5 illustrative proteins, which are se- 
lected randomly from PDB except appropriate chain length. 
(N = 50, Rg = 9.7 for IPTQ; N = 110, Rg = 13.13 for IBFE; 
N = 373, Rg = 26.98 for lAGQ; N = 1268, Rg = 38.74 for 
1KX5; N = 6577, Rg = 71.68 for 1JJ2.) The data are ob- 
tained by counting the number of amino acids (n) within dif- 
ferent given radius (r) starting at the geometrical center. It 
is clear that there exists a wide self- similarity region between 
small-scale and large-scale structure, with exponent a ^ 2. 
Beyond this region, a drops to 1 quickly, due to finite chain 
length. 



In above argument, we neglect many other effects [3|, 
S S Q: such as entropy effect {E entropy = 7^o/^ ^ 
N^^~^) and three-body repulsive interaction {Ethree = 
IJiN^/R^ ex AT^-e^) etc. Nevertheless, in the re- 
gion we are interested {v G (|, f ), <^ ^ (l^^)). 



it is easily to check that limAr- 



0, limN ^00 Ethree/ E^ 
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In equilibrium state, radius of gyration can be esti- 
mated by minimizing the total energy E. Let dE/dRg = 
0, we have 



Rg - 



QksTv 



nl/5 



i^n{n^ l)2/« 



N^ oc N^ (8) 



which gives 



ba 



(9) 



In Fig. 4, we can see that classical Flory's theory acts 
as an extreme case in our new formula. In good solvent, 
polymer becomes loose, and can be modeled as a one- 
dimensional long chain. Thus a = 1, which gives u = 
3/5. In poor solvent, polymer is highly compressed by 
solvent pressure, and becomes as well-packed as crystals. 
It means a = 3, then u = 1/3. 

In the case of protein under physiological condition, we 
have Of ~ 2 (Fig. 3), so i^ ~ 2/5. It suggests that many 
amino acid residues (ex N) are distributed at the surface 



of a protein; and the interior is not so compact as what 
having been thought before. This result is also supported 
by other researches 0, [H, [13, Q [H, H [13 • 
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FIG. 5: Hydrophobicity distribution of 33901 protein struc- 
tures in PDB. Data is fitted by a Gaussian curve Y = 

-(X-0.5)^ 

0.094 • e 2*0.054^ . 



FIG. 4: The blue solid curve is for Eqn.(9). The red squares 
are for three ideal cases respectively: polymer in good sol- 
vent (a = 1), protein under physiological condition (a — 2) 
and polymer in poor solvent (a = 3). The crosses stand for 
statistical values of exponent v for proteins with different hy- 
drophobicity {h). y{h) is estimated by least-square linear fit- 
ting of uniformly selected statistical data (proteins with same 
hydrophobicity within ±0.005) from PDB. Data for h < 0.25 
and h > 0.75, as well as /i = 0.275, 0.675, 0.725 are missing 
due to inadequate samples. 



IV. DEPENDENCE ON HYDROPHOBICITY 

Above deduction is based on assumption of homopoly- 
mer. However, in fact, protein is a heteropolymer made 
up of twenty different kinds of amino acids. Thus expo- 
nent u generally depends on the component of the chain, 
especially its hydrophobicity. To study this effect, a sim- 
ple H-P model is introduced. Here we adopt the cate- 
gory method of Kyte and Doolittle[18]. All amino acids 
with positive values in K-D method are regarded as hy- 
drophobic (I, V, L, F, C, M, A, G); while other ones with 
negative values are regarded as hydrophilic (T, S, W, Y, 
P, H, E, N, Q, D, K, R). 

The fraction of hydrophobic amino acids in a protein is 
defined as its hydrophobicity (/i). Then if all amino acids 
are hydrophilic (/i = 0), which just corresponds to good 
solvent condition, the protein will be fully extended with 
dimension a = 1. In this case, constrains arising from 
covalent bonds are dominant interaction against swelling 
tendency. If all amino acids are hydrophobic {h = 1), 
corresponding to poor solvent condition, the protein is 
highly compressed by solvent pressure and dimension 
a = 3. Strong hydrophobic interactions are balanced 
by excluded volume effect between amino acid residues. 



For natural proteins, their hydrophobicity has a 
Gaussian-like distribution (Fig. 5). In the region h G 
[0.4,0.6], scaling exponent is almost unchanged (Fig. 4), 
u ^ 2/5. For h < 0.4 or h > 0.6, the number of natu- 
ral proteins are quite limited. And their corresponding 
exponent u varies largely. Especially for h < 0.25 or 
h > 0.75, the proteins can regarded as total hydrophilic 
or hydrophobic respectively. These results hint appropri- 
ate hydrophobicity is essential to maintain overall struc- 
ture of a natural protein. 

To study how hydrophobicity affects scaling exponnet, 
we try to theoretically predict a = a{h)^ h G [0, 1]. 

We record the state of protein-solvent coupled system 
as X{ Hydrophobicity of Protein, Polarity of Solvent}= 
X{h^p}. Then two proteins with different hydrophobic- 
ity in same water solution are written as 



X{h = ho,p = 0} ^ X{h = hi,p = 0} 



(10) 



Here the polarity of water solution is set to zero. If we 
know how above two states are changed into one another, 
we can predict the relation of a{h). However, above pro- 
cess is connected by amino acid mutation, which is not a 
chemical reaction and not easy to analyze. Here we adopt 
an alternative way, which is based on the assumption that 
varying the hydrophobicity of a protein is equivalent to 
varying the polarity of solvent. According to biochem- 
istry, the hydrophobicity of a protein is closely related to 
the polarity of solvent. The more polar the solvent is, the 
more hydrophobic the protein is; while the less polar the 
solvent is, the more hydrophilic the protein will be. Thus 
we can assume following two systems are equivalent 

X{h = hi,p = 0} = X{h = ho,p = m{hi - ho)} (11) 

Here, we adopt a linear relationship between hydropho- 
bicity and polarity, and its validity remains to be verified 
by experiments. From Eqn.(lO) and (11), we can turn 




FIG. 6: Illustration for our main idea of studying a{h). Pro- 
cess (I)-(3) correspond to Eqn.(10)-(12) separately. Process (T) 
is what we want to study. However, varying the hydropho- 
bicity of a protein by amino acid mutation is not a chemical 
process, and hard to grasp. Process (2) is our main assump- 
tion: varying the hydrophobicity of a protein is equivalent to 
varying the polarity of solvent. Process (3) is a real chemi- 
cal reaction. By adding polar or nonpolar molecules, we can 
control the polarity of solvent. 



a physical process- varying hydrophobicity of a chain by 
amino acid mutation, to a chemical process-varying po- 
larity of solvent by adding polar or nonpolar molecules 
(Fig. 6). Thus, Eqn.(lO) is equivalent to following process 

X{h = ho,p = 0} ^ X{h = ho,p = m{hi - ho)} (12) 

Now the study on a{h) is changed into a chemical re- 
action. Suppose there are three separated stable thermal 
states Xi,X2,X3, which represent proteins in good sol- 
vent, under physiological condition and in poor solvent 
respectively. Their corresponding fractal dimensions are 
a{Xi) = 1, a{X2) = 2 and a{Xs) = 3. 

We start from the state under physiological condition. 
When the condition is changed from water solution to 
good solvent, which can be done by adding nonpoler 
molecules (TV), proteins will change from X2 state to Xi 
state, according to following chemical process 



^2 ^i. 



Xi 



(13) 



Here reaction constants ki and k-i depend on the con- 
centration of nonpolar molecules added {[N] is normal- 
ized to be in [0, 1]). Let [Xi] be the fraction of proteins in 
state Xi. When system reach equilibrium state, we have 

[X,]/[X2] = ki/k-i=Ki{[N]) (14) 

Here, a power function is chosen for above relation 

K,{[N]) = C^Nr- (15) 



Due to the conservation law of matter [Xi] + [X2] = 1, we 
have [Xi] = Ki/{1 + i^i), [X2] = 1/(1 + Ki). Then for 
proteins with solvent condition between water solution 
and good solvent, their average fractal dimension is given 
by a Hill function 



a = a{Xi)[Xi] ^ a{X2)[X2] = I ■ 



■Ci[NY 



(16) 



with [N] e [0,1] andCi > 1. 

Similarly, we can study proteins changed from X2 state 
to Xs state, or from water solution to poor solvent, which 
can be done by adding poler molecules (P). This process 
is described as 



^2 ^i, . 



X. 



(17) 



Thus in the equilibrium state, 

[X3]/[X2] = k2/k_2 = K2{[P]) = C2[Pr' (18) 

[P] is the concentration of polar molecules added, and 
normalized to be in [0,1]. As [X2] + [X3] = 1, [X2] = 
1/(1 + K2), [Xs] = K2/{1 + K2). For proteins with 
solvent condition from water solution to poor solvent, 
the average fractal dimension is given by 

a = a{X2)[X2]^a{Xs)[Xs]=3--^^^^^^ (19) 

with [P] e [0,1] andCs > 1. 

Take a linear relationship between hydrophobicity and 

polarity: h = ~^ ^^^ ^ = — 2 ' ^^ ^^^ ^^ ^^^ statis- 
tical data by Eqn.(16) and (19) with appropriate values 
of mi, m2, Ci, C2 (Fig.7). 
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FIG. 7: Statistical data for a = a{h), which is obtained 
through inverse transform of data shown in Fig. 4 by Eqn.(9). 
Fitting curve are given by Eqn.(16) and (19), with mi = 
8, m2 = 5, Ci =3^ C2 = 2^ 



A suggested function of a = a{h) is given by 
1+ i+38(i-2/i)« ^ for hG [0,0. 5] 



a{h) 



^- 1+25(2/1-1)5 ^ forhG(0.5,l] 



(20) 



CONCLUSION 



for he [0,0.5], Ci > 1; and 



In summary, we have derived a unified formula for the 
scaling law between radius of gyration and the length of 
homopolymer chain. It shows that this exponent is gen- 
erally correlated with the fractal dimension of a chain 
under certain solvent condition. Our new formula cov- 
ers the well-known Flory's theory for polymers under 
good and poor solvent conditions as two extreme cases. 
It can be applied to proteins under physiological con- 
dition {v ^ 2/5) too, with a predicted fractal dimen- 
sional a ^ 2. Influence of hydrophobicity on the com- 
pactness of a protein has also been studied through a 
simple H-P model. By considering the equivalence be- 
tween protein- solvent coupled systems, we turn a physical 
process- varying the hydrophobicity of a chain by amino 
acid mutation, to a chemical process-varying the polar- 
ity of solvent by adding polar or nonpolar molecules. 
This enables us to derive a functional relation between 
hydrophobicity and fractal dimension, with reasonable 
agreement to statistical data. This relation will be help- 
ful for protein structure prediction. Our results indicate 
that the protein may share the same basic principle with 
homopolymer, despite its speciality as a heteropolymer. 
Hope this work can shed light on the mechanism of pro- 
tein folding and stability of protein structures. 
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FIG. 8: The statistical data is the same as Fig. 4. Fitting 
curves are given by Eqn.(Al) and (A2), with mi = ?7i2 = 
5, Ci =3^ C2 = 2.2^ 



APPENDIX A: DIRECT STUDY OF iy{h) 

Although we can get u{h) according to Eqn.(9) and 
(20), a direct prediction is also possible in the same way 
as Sec.III. Suppose u{Xi) = 3/5, z/(X2) = 2/5, ^^(Xs) = 
1/3, the average scaling exponent is given by 
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A suggested function of u = iy{h) is given by 



for hG [0,0.5] 
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